library(nycflights13)
library(tidyverse)

5.2 Filter rows with filter()

5.2.4 Exercises

  1. Find all flights that:
  1. Had an arrival delay of two or more hours
filter(flights, arr_delay >= 120)
  1. Flew to Houston (IAH or HOU)
filter(flights, dest %in% c("IAH", "HOU"))
  1. Were operated by United, American, or Delta
airlines
filter(flights, carrier %in% c("UA", "AA", "DL"))
  1. Departed in summer (July, August, and September)
filter(flights, month %in% c(7, 8, 9))
  1. Arrived more than two hours late, but didn’t leave late
filter(flights, arr_delay > 120, dep_delay <= 0)
  1. Were delayed by at least an hour, but made up over 30 minutes in flight
filter(flights, arr_delay >= 60, arr_delay - dep_delay < -30)
  1. Departed between midnight and 6am (inclusive)
filter(flights, dep_time <= 600 | dep_time == 2400)
  1. Another useful dplyr filtering helper is between(). What does it do? Can you use it to simplify the code needed to answer the previous challenges?

between() returns items that arehave a variable value between two boundary values (inclusive, ie it tests for >= and <= on the left and right boundaries).

  1. Departed in summer (July, August, and September)
filter(flights, between(month, 7, 9))
  1. How many flights have a missing dep_time? What other variables are missing? What might these rows represent?
filter(flights, is.na(dep_time))

These rows are also missing dep_delay, arr_time, arr_delay and air_time. Given that they have all the scheduled details but are missing all actual flight data, these rows appear to represent cancelled flights.

  1. Why is NA ^ 0 not missing? Why is NA | TRUE not missing? Why is FALSE & NA not missing? Can you figure out the general rule? (NA * 0 is a tricky counterexample!)
NA ^ 0
[1] 1
NA | TRUE
[1] TRUE
NA | FALSE # Counter-example
[1] NA
FALSE & NA
[1] FALSE
TRUE & NA # Counter-example
[1] NA
NA * 0
[1] NA

NA ^ 0 is not missing because any value to the power of zero equals 1 (although I don’t have an explanation right now why the same principle doesn’t apply for NA * 0). NA | TRUE is not missing because only one side of the ‘or’ operator needs to evaluate as true (conversely, note that NA | FALSE is missing). FALSE & NA is not missing because both sides of the ‘and’ operator would need to evaluate TRUE for it to be true so based on the right-hand side it would be false regardless of the actual value of the NA (conversely, note that TRUE & NA is missing).

5.3 Arrange rows with arrange()

5.3.1 Exercises

  1. How could you use arrange() to sort all missing values to the start? (Hint: use is.na()).

Using dep_time as an example.

arrange(flights, desc(is.na(dep_time)))
  1. Sort flights to find the most delayed flights. Find the flights that left earliest.
arrange(flights, desc(arr_delay), dep_delay)
  1. Sort flights to find the fastest flights.
arrange(flights, air_time)
  1. Which flights travelled the longest? Which travelled the shortest?
arrange(flights, desc(distance))
arrange(flights, distance)

5.4 Select columns with select()

5.4.1 Exercises

  1. Brainstorm as many ways as possible to select dep_time, dep_delay, arr_time, and arr_delay from flights.
select(flights, dep_time, dep_delay, arr_time, arr_delay)
select(flights, 4, 6, 7, 9)
select(flights, starts_with("dep_"), starts_with("arr_"))

Could keep going with a minus operator to drop all of the other columns, etc.

  1. What happens if you include the name of a variable multiple times in a select() call?
select(flights, dep_time, dep_time)

It doesn’t duplicate the variable.

  1. What does the one_of() function do? Why might it be helpful in conjunction with this vector?

It allows selection of variables by matching against a vector of strongs. In the code below I’ve used it to select all of the variables that aren’t listed in the vector.

vars <- c("year", "month", "day", "dep_delay", "arr_delay")
select(flights, -one_of(vars))
  1. Does the result of running the following code surprise you? How do the select helpers deal with case by default? How can you change that default?
select(flights, contains("TIME"))

By default the select helpers are case-insensitive. It can be modified by passing the argument ignore.case = FALSE.

select(flights, contains("TIME", ignore.case = FALSE))
LS0tCnRpdGxlOiAiQ2hhcHRlciA1OiBEYXRhIFRyYW5zZm9ybWF0aW9uIgpvdXRwdXQ6IGh0bWxfbm90ZWJvb2sKLS0tCgpgYGB7cn0KbGlicmFyeShueWNmbGlnaHRzMTMpCmxpYnJhcnkodGlkeXZlcnNlKQpgYGAKCiMgNS4yIEZpbHRlciByb3dzIHdpdGggYGZpbHRlcigpYAoKIyMgNS4yLjQgRXhlcmNpc2VzCgoxLiBGaW5kIGFsbCBmbGlnaHRzIHRoYXQ6CgphLiBIYWQgYW4gYXJyaXZhbCBkZWxheSBvZiB0d28gb3IgbW9yZSBob3VycwoKYGBge3J9CmZpbHRlcihmbGlnaHRzLCBhcnJfZGVsYXkgPj0gMTIwKQpgYGAKCmIuIEZsZXcgdG8gSG91c3RvbiAoSUFIIG9yIEhPVSkKCmBgYHtyfQpmaWx0ZXIoZmxpZ2h0cywgZGVzdCAlaW4lIGMoIklBSCIsICJIT1UiKSkKYGBgCgpjLiBXZXJlIG9wZXJhdGVkIGJ5IFVuaXRlZCwgQW1lcmljYW4sIG9yIERlbHRhCgpgYGB7cn0KYWlybGluZXMKZmlsdGVyKGZsaWdodHMsIGNhcnJpZXIgJWluJSBjKCJVQSIsICJBQSIsICJETCIpKQpgYGAKCmQuIERlcGFydGVkIGluIHN1bW1lciAoSnVseSwgQXVndXN0LCBhbmQgU2VwdGVtYmVyKQoKYGBge3J9CmZpbHRlcihmbGlnaHRzLCBtb250aCAlaW4lIGMoNywgOCwgOSkpCmBgYAoKZS4gQXJyaXZlZCBtb3JlIHRoYW4gdHdvIGhvdXJzIGxhdGUsIGJ1dCBkaWRu4oCZdCBsZWF2ZSBsYXRlCgpgYGB7cn0KZmlsdGVyKGZsaWdodHMsIGFycl9kZWxheSA+IDEyMCwgZGVwX2RlbGF5IDw9IDApCmBgYAoKZi4gV2VyZSBkZWxheWVkIGJ5IGF0IGxlYXN0IGFuIGhvdXIsIGJ1dCBtYWRlIHVwIG92ZXIgMzAgbWludXRlcyBpbiBmbGlnaHQKCmBgYHtyfQpmaWx0ZXIoZmxpZ2h0cywgYXJyX2RlbGF5ID49IDYwLCBhcnJfZGVsYXkgLSBkZXBfZGVsYXkgPCAtMzApCmBgYAoKZy4gRGVwYXJ0ZWQgYmV0d2VlbiBtaWRuaWdodCBhbmQgNmFtIChpbmNsdXNpdmUpCgpgYGB7cn0KZmlsdGVyKGZsaWdodHMsIGRlcF90aW1lIDw9IDYwMCB8IGRlcF90aW1lID09IDI0MDApCmBgYAoKMi4gQW5vdGhlciB1c2VmdWwgZHBseXIgZmlsdGVyaW5nIGhlbHBlciBpcyBgYmV0d2VlbigpYC4gV2hhdCBkb2VzIGl0IGRvPyBDYW4geW91IHVzZSBpdCB0byBzaW1wbGlmeSB0aGUgY29kZSBuZWVkZWQgdG8gYW5zd2VyIHRoZSBwcmV2aW91cyBjaGFsbGVuZ2VzPwoKKipgYmV0d2VlbigpYCByZXR1cm5zIGl0ZW1zIHRoYXQgYXJlaGF2ZSBhIHZhcmlhYmxlIHZhbHVlIGJldHdlZW4gdHdvIGJvdW5kYXJ5IHZhbHVlcyAoaW5jbHVzaXZlLCBpZSBpdCB0ZXN0cyBmb3IgYD49YCBhbmQgYDw9YCBvbiB0aGUgbGVmdCBhbmQgcmlnaHQgYm91bmRhcmllcykuKioKCmQuIERlcGFydGVkIGluIHN1bW1lciAoSnVseSwgQXVndXN0LCBhbmQgU2VwdGVtYmVyKQoKYGBge3J9CmZpbHRlcihmbGlnaHRzLCBiZXR3ZWVuKG1vbnRoLCA3LCA5KSkKYGBgCgozLiBIb3cgbWFueSBmbGlnaHRzIGhhdmUgYSBtaXNzaW5nIGBkZXBfdGltZWA/IFdoYXQgb3RoZXIgdmFyaWFibGVzIGFyZSBtaXNzaW5nPyBXaGF0IG1pZ2h0IHRoZXNlIHJvd3MgcmVwcmVzZW50PwoKYGBge3J9CmZpbHRlcihmbGlnaHRzLCBpcy5uYShkZXBfdGltZSkpCmBgYAoKKipUaGVzZSByb3dzIGFyZSBhbHNvIG1pc3NpbmcgYGRlcF9kZWxheWAsIGBhcnJfdGltZWAsIGBhcnJfZGVsYXlgIGFuZCBgYWlyX3RpbWVgLiBHaXZlbiB0aGF0IHRoZXkgaGF2ZSBhbGwgdGhlIHNjaGVkdWxlZCBkZXRhaWxzIGJ1dCBhcmUgbWlzc2luZyBhbGwgYWN0dWFsIGZsaWdodCBkYXRhLCB0aGVzZSByb3dzIGFwcGVhciB0byByZXByZXNlbnQgY2FuY2VsbGVkIGZsaWdodHMuKioKCjQuIFdoeSBpcyBgTkEgXiAwYCBub3QgbWlzc2luZz8gV2h5IGlzIGBOQSB8IFRSVUVgIG5vdCBtaXNzaW5nPyBXaHkgaXMgYEZBTFNFICYgTkFgIG5vdCBtaXNzaW5nPyBDYW4geW91IGZpZ3VyZSBvdXQgdGhlIGdlbmVyYWwgcnVsZT8gKGBOQSAqIDBgIGlzIGEgdHJpY2t5IGNvdW50ZXJleGFtcGxlISkKCmBgYHtyfQpOQSBeIDAKTkEgfCBUUlVFCk5BIHwgRkFMU0UgIyBDb3VudGVyLWV4YW1wbGUKRkFMU0UgJiBOQQpUUlVFICYgTkEgIyBDb3VudGVyLWV4YW1wbGUKTkEgKiAwCmBgYAoKKipgTkEgXiAwYCBpcyBub3QgbWlzc2luZyBiZWNhdXNlIGFueSB2YWx1ZSB0byB0aGUgcG93ZXIgb2YgemVybyBlcXVhbHMgMSAoYWx0aG91Z2ggSSBkb24ndCBoYXZlIGFuIGV4cGxhbmF0aW9uIHJpZ2h0IG5vdyB3aHkgdGhlIHNhbWUgcHJpbmNpcGxlIGRvZXNuJ3QgYXBwbHkgZm9yIGBOQSAqIDBgKS4gYE5BIHwgVFJVRWAgaXMgbm90IG1pc3NpbmcgYmVjYXVzZSBvbmx5IG9uZSBzaWRlIG9mIHRoZSAnb3InIG9wZXJhdG9yIG5lZWRzIHRvIGV2YWx1YXRlIGFzIHRydWUgKGNvbnZlcnNlbHksIG5vdGUgdGhhdCBgTkEgfCBGQUxTRWAgaXMgbWlzc2luZykuIGBGQUxTRSAmIE5BYCBpcyBub3QgbWlzc2luZyBiZWNhdXNlIGJvdGggc2lkZXMgb2YgdGhlICdhbmQnIG9wZXJhdG9yIHdvdWxkIG5lZWQgdG8gZXZhbHVhdGUgVFJVRSBmb3IgaXQgdG8gYmUgdHJ1ZSBzbyBiYXNlZCBvbiB0aGUgcmlnaHQtaGFuZCBzaWRlIGl0IHdvdWxkIGJlIGZhbHNlIHJlZ2FyZGxlc3Mgb2YgdGhlIGFjdHVhbCB2YWx1ZSBvZiB0aGUgYE5BYCAoY29udmVyc2VseSwgbm90ZSB0aGF0IGBUUlVFICYgTkFgIGlzIG1pc3NpbmcpLioqCgojIDUuMyBBcnJhbmdlIHJvd3Mgd2l0aCBgYXJyYW5nZSgpYAoKIyMgNS4zLjEgRXhlcmNpc2VzCgoxLiBIb3cgY291bGQgeW91IHVzZSBgYXJyYW5nZSgpYCB0byBzb3J0IGFsbCBtaXNzaW5nIHZhbHVlcyB0byB0aGUgc3RhcnQ/IChIaW50OiB1c2UgYGlzLm5hKClgKS4KCioqVXNpbmcgYGRlcF90aW1lYCBhcyBhbiBleGFtcGxlLioqCgpgYGB7cn0KYXJyYW5nZShmbGlnaHRzLCBkZXNjKGlzLm5hKGRlcF90aW1lKSkpCmBgYAoKMi4gU29ydCBmbGlnaHRzIHRvIGZpbmQgdGhlIG1vc3QgZGVsYXllZCBmbGlnaHRzLiBGaW5kIHRoZSBmbGlnaHRzIHRoYXQgbGVmdCBlYXJsaWVzdC4KCmBgYHtyfQphcnJhbmdlKGZsaWdodHMsIGRlc2MoYXJyX2RlbGF5KSwgZGVwX2RlbGF5KQpgYGAKCjMuIFNvcnQgZmxpZ2h0cyB0byBmaW5kIHRoZSBmYXN0ZXN0IGZsaWdodHMuCgpgYGB7cn0KYXJyYW5nZShmbGlnaHRzLCBhaXJfdGltZSkKYGBgCgo0LiBXaGljaCBmbGlnaHRzIHRyYXZlbGxlZCB0aGUgbG9uZ2VzdD8gV2hpY2ggdHJhdmVsbGVkIHRoZSBzaG9ydGVzdD8KCmBgYHtyfQphcnJhbmdlKGZsaWdodHMsIGRlc2MoZGlzdGFuY2UpKQpgYGAKCmBgYHtyfQphcnJhbmdlKGZsaWdodHMsIGRpc3RhbmNlKQpgYGAKCiMgNS40IFNlbGVjdCBjb2x1bW5zIHdpdGggYHNlbGVjdCgpYAoKIyMgNS40LjEgRXhlcmNpc2VzCgoxLiBCcmFpbnN0b3JtIGFzIG1hbnkgd2F5cyBhcyBwb3NzaWJsZSB0byBzZWxlY3QgYGRlcF90aW1lYCwgYGRlcF9kZWxheWAsIGBhcnJfdGltZWAsIGFuZCBgYXJyX2RlbGF5YCBmcm9tIGZsaWdodHMuCgpgYGB7cn0Kc2VsZWN0KGZsaWdodHMsIGRlcF90aW1lLCBkZXBfZGVsYXksIGFycl90aW1lLCBhcnJfZGVsYXkpCmBgYAoKYGBge3J9CnNlbGVjdChmbGlnaHRzLCA0LCA2LCA3LCA5KQpgYGAKCmBgYHtyfQpzZWxlY3QoZmxpZ2h0cywgc3RhcnRzX3dpdGgoImRlcF8iKSwgc3RhcnRzX3dpdGgoImFycl8iKSkKYGBgCgoqKkNvdWxkIGtlZXAgZ29pbmcgd2l0aCBhIG1pbnVzIG9wZXJhdG9yIHRvIGRyb3AgYWxsIG9mIHRoZSBvdGhlciBjb2x1bW5zLCBldGMuKioKCjIuIFdoYXQgaGFwcGVucyBpZiB5b3UgaW5jbHVkZSB0aGUgbmFtZSBvZiBhIHZhcmlhYmxlIG11bHRpcGxlIHRpbWVzIGluIGEgYHNlbGVjdCgpYCBjYWxsPwoKYGBge3J9CnNlbGVjdChmbGlnaHRzLCBkZXBfdGltZSwgZGVwX3RpbWUpCmBgYAoKKipJdCBkb2Vzbid0IGR1cGxpY2F0ZSB0aGUgdmFyaWFibGUuKioKCjMuIFdoYXQgZG9lcyB0aGUgYG9uZV9vZigpYCBmdW5jdGlvbiBkbz8gV2h5IG1pZ2h0IGl0IGJlIGhlbHBmdWwgaW4gY29uanVuY3Rpb24gd2l0aCB0aGlzIHZlY3Rvcj8KCioqSXQgYWxsb3dzIHNlbGVjdGlvbiBvZiB2YXJpYWJsZXMgYnkgbWF0Y2hpbmcgYWdhaW5zdCBhIHZlY3RvciBvZiBzdHJvbmdzLiBJbiB0aGUgY29kZSBiZWxvdyBJJ3ZlIHVzZWQgaXQgdG8gc2VsZWN0IGFsbCBvZiB0aGUgdmFyaWFibGVzIHRoYXQgX2FyZW4ndF8gbGlzdGVkIGluIHRoZSB2ZWN0b3IuKioKCmBgYHtyfQp2YXJzIDwtIGMoInllYXIiLCAibW9udGgiLCAiZGF5IiwgImRlcF9kZWxheSIsICJhcnJfZGVsYXkiKQpzZWxlY3QoZmxpZ2h0cywgLW9uZV9vZih2YXJzKSkKYGBgCgo0LiBEb2VzIHRoZSByZXN1bHQgb2YgcnVubmluZyB0aGUgZm9sbG93aW5nIGNvZGUgc3VycHJpc2UgeW91PyBIb3cgZG8gdGhlIGBzZWxlY3RgIGhlbHBlcnMgZGVhbCB3aXRoIGNhc2UgYnkgZGVmYXVsdD8gSG93IGNhbiB5b3UgY2hhbmdlIHRoYXQgZGVmYXVsdD8KCmBgYHtyfQpzZWxlY3QoZmxpZ2h0cywgY29udGFpbnMoIlRJTUUiKSkKYGBgCgoqKkJ5IGRlZmF1bHQgdGhlIGBzZWxlY3RgIGhlbHBlcnMgYXJlIGNhc2UtaW5zZW5zaXRpdmUuIEl0IGNhbiBiZSBtb2RpZmllZCBieSBwYXNzaW5nIHRoZSBhcmd1bWVudCBgaWdub3JlLmNhc2UgPSBGQUxTRWAuKioKCmBgYHtyfQpzZWxlY3QoZmxpZ2h0cywgY29udGFpbnMoIlRJTUUiLCBpZ25vcmUuY2FzZSA9IEZBTFNFKSkKYGBgCg==